Active Topology Inference using Network Coding 

Pegah Sattari, IEEE Student Member, Christina Fragouli, IEEE Member, and Athina Markopoulou, IEEE Member 



O 1 

(N : 



o 

(N 






> 

m 
m 
en 

r> 

o 
o 



X 



Abstract — Our goal, in this paper, is to infer the topology of 
a network when (i) we can send probes between sources and 
receivers at the edge of the network and (ii) intermediate nodes 
can perform simple network coding operations, i.e., additions. 
Our key intuition is that network coding introduces topology- 
dependent correlation in the observations at the receivers, which 
can be exploited to infer the topology. For tree topologies, 
we design hierarchical clustering algorithms, building on our 
prior work in |T|. For directed acyclic graphs (DAGs), first we 
decompose the topology into a number of two source, two receiver 
subnetwork components and then we merge these components to 
reconstruct the topology. Our approach for DAGs builds on prior 
work on tomography |'2|, and improves upon it by employing 
network coding to accurately distinguish among all different 2- 
by-2 components. We evaluate our algorithms through simulation 
of a number of realistic topologies. 

Index Terms — Network Coding, Topology Inference. 



I. Introduction 

Knowledge of network topology is important for network 
management, diagnosis, operations, security and performance 
optimization. Depending on the context, "topology" may refer 
to different layers, such as the Internet's router-level topology, 
an overlay network or a peer-to-peer topology, a wireless ad- 
hoc network topology etc. Topology may not be available for 
various reasons, e.g., either because operators do not want 
to reveal the internal characteristics of their network to the 
outside world, and/or because topology changes frequently. 

Due to the importance of network topology, a large body of 
prior work has focused on measurement of network topol- 
ogy. A family of techniques are based on traceroute- 
like measurements, which collect the ids of nodes across 
trace route paths and use them to reconstruct the topol- 
ogy. Another family of techniques are tomographic: network 
tomography aims at inferring internal network characteris- 
tics, including topology, using end-to-end probes, putting the 
processing burden on a few end-nodes and keeping internal 
nodes simple. More specifically, multicast or unicast probes 
are sent and received between sets of nodes and the topology 
is inferred based on the number and order of received probes. 
In this paper, we revisit the problem of topology inference 
using end-to-end probes, but in a network with network coding 
capabilities. 

The network coding paradigm is based on the idea that in- 
termediate nodes linearly combine packets and receivers solve 
a system of linear equations to recover the original packets. 
This idea has been shown to bring benefits, for example in 
terms of throughput, complexity and reliability. The network 
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coding idea is well-matched to content distribution over peer- 
to-peer networks, overlay networks, wireless multihop, etc. In 
this paper, we consider networks that are already equipped 
with simple network coding capabilities and we design active 
probing schemes that make use of these capabilities to improve 
topology inference. 

Our key intuition is that network coding introduces 
topology-dependent correlation in the content of packets ob- 
served at the receivers, which can then be exploited to reverse- 
engineer the topology. For example, a coding point, i.e., a 
node that combines one or more incoming packets, introduces 
correlation between packets coming from different sources, 
in a similar way that multicast introduces correlation in the 
packets sent by the same source and received by several 
receivers. In fact, this correlation introduced by multicast, 
has been the starting point and the main idea underlying 
tomographic topology inference. Subsequent schemes made 
this idea more practical, by emulating multicast with back-to- 
back unicast probes |2j, J3|. In contrast, relating probes from 
different sources, in order to reveal intermediate nodes, has 
been a challenge in the tomography literature. Using simple 
network coding operations at coding points solves this problem 
and allows accurate and fast topology inference. 

First, we consider undirected trees, where leaves can act 
as both sources and receivers of probe packets, and we de- 
sign hierarchical clustering algorithms that infer the topology, 
building on our prior work in |1J. Then, we consider general 
graphs, in particular directed acyclic graphs (DAGs) with a 
fixed set of M sources and N receivers and a pre-determined 
routing scheme. We first decompose the topology into a num- 
ber of two source, two receiver subnetwork components and 
then we merge these components to reconstruct the topology. 
Our approach for DAGs builds on prior work on tomography 
0, but improves upon it by employing simple network coding 
operations at intermediate nodes (just additions) to determin- 
istically distinguish among all candidate 2-by-2 subnetwork 
components, which was impossible without network coding 
El, El- We evaluate our algorithms through simulation over 
a number of topologies and we show that they can infer the 
topology accurately and faster than traditional approaches. 

The structure of the rest of the paper is as follows. SectionHIl 
discusses related work. Section [III] presents our assumptions, 
notation and problem statement. Section [IV] presents algo- 
rithms for inferring tree topologies. Binary trees are discussed 
in Section IIV-AI in the absence (Section UV-Alb or presence 
(Section [lV-A2b of packet loss. General trees are discussed in 
Sections II V-B 1 1 and IIV-B2I Section [V] presents algorithms for 
inferring general directed acyclic graphs (DAGs). Section [V-BI 
presents our algorithms for inferring 2-by-2 subnetwork com- 
ponents, in the absence (Section IV-BU or presence (Sec- 
tion EB2]i of packet loss. Section [V-CI explains how to merge 



these components to reconstruct the entire topology. Sec- 
tion |VI] provides simulation results for random graphs as well 
as real Internet topologies. Section |VH| provides an in-depth 
comparison and makes connections between our approach 
and alternative topology inference approaches. Section IVIIII 
concludes the paper. 

II. Related Work 

There are two bodies of related work: one from the network 
tomography and another from the network coding literature. 

A good survey of network tomography can be found in 
0. In our work, we are interested in inferring the topology 
based on end-to-end probes, rather than link-level character- 
istics. An early work on topology inference using end-to-end 
measurements is 0, where the correlation between end-to-end 
multicast packet loss patterns was used to infer the topology 
of binary trees. The correctness of this idea was rigorously 
established in (6), and was extended to general trees and to 
measurements other than loss, such as delay variance Q, or 
more generally any metric that grows monotonically with the 
number of traversed links. The idea has also been extended to 
unicast probes 0, 0. In summary, tomographic schemes for 
topology inference use end-to-end active probing and feed the 
number, order or a monotonic property {e.g., delay variance or 
loss) of received probes as input to statistical signal-processing 
techniques. Inference of link characteristics |9) can also be 
combined with topology inference 0. 

Most tomographic techniques rely on probes sent from 
a single source in a tree topology 0-181, ifTOl - lfTTl . The 
work by Rabbat et al. introduced the multiple-source multiple- 
destination (M-by-N) tomography problem, i.e., sending 
probes between M sources and TV destinations 0,0. They 
showed that an M-by-N topology can be decomposed into a 
collection of 2-by-2 components. They proposed to coordinate 
transmission of multi-packet probes from two sources and 
to measure the packet arrival order at the two receivers in 
order to infer some information about the 2-by-2 topology. 
Assuming knowledge of M 1-by-N topologies and of all 2- 
by-2 components, they also showed how to merge a second 
source's 1-by-N topology with the first. The resulting M-by-N 
topology is not exact, but provides bounds on the locations 
of joining points with respect to the branching points. It also 
requires a large number of probes, as do all approaches that 
need to collect enough probes for statistical significance 0, 
181. ITT211, IfBI ItTTl. Ifl8l Our work on DAGs builds on and 
extends the multiple-source multiple-destination work in 0, 
0, but uses network coding to achieve exact and fast topology 
inference. 

Independently, the field of network coding Ifl9l , ||20l ad- 
vocates that intermediate nodes mix packets and receivers 
solve a system of equations. Linear network coding es- 
sentially translates network topology to the corresponding 
transfer matrix from the sources to the receivers (the end- 
points). Recently, network coding ideas have been applied to 
tomography problems, in order to exploit this intimate relation 
between topology and end-to-end observations. In 12T1 . l22l . 
we revisited link-loss (but not topology) tomography using 



active probing and network coding. In the first part of this 
paper, we extend our preliminary work in 0, where we 
showed that active probes from two sources and XOR at 
intermediate nodes are sufficient to infer the topology of a 
binary tree. This approach generalizes to general trees, but 
not to general graphs. In l23l . we used a different approach 
for general graphs, which is closer to the work by Rabbat et 
al. 0, 0: we identify 2-by-2 components and merge them 
together in an M-by-N topology. This journal paper combines 
and extends our preliminary work in 0, l23l . 

The following papers consider random network coding, 
for the purpose of information transfer, and perform passive 
topology inference on the side. In ll24l . passive techniques 
have been used to distinguish among failure patterns. In 11251 - 
ll27l . subspace properties at various nodes have been used for 
topology inference and error localization. In 11251 . 11281 . l29l . 
each node passively infers the upstream network topology at 
no cost to throughput but at high complexity. In contrast, we 
propose active probing and a simple coding scheme at interme- 
diate nodes, to achieve low-complexity topology inference at 
the end nodesU Furthermore, we do not require the end-points 
to have any a-priori knowledge of identity or operations of 
the intermediate nodes. In Section rVHl we provide a detailed 
comparison and make connections between active and passive 
topology inference. 

Finally, the predominantly employed approach to Internet 
topology inference today is based on traceroute l30l - ll39l . 
Multiple traceroute's are sent among monitoring hosts, 
they record router ids along paths, and this information is put 
together to reconstruct the graph. The traceroute-based 
approach is discussed in detail in Section IV11-DI 

III. Problem Statement and Model 

Topology. In the first part of the paper, we consider undi- 
rected trees: there are n vertices, n — 1 edges that can be 
used in both directions, and exactly one path between any two 
vertices. We denote by C = {1,2, ...,£} the leaf-vertices of 
the tree, which correspond to end-hosts that can act as sources 
or receivers of probe packets. 

In the second part, we consider directed acyclic graph 
(DAG) topologies between M sources and N receivers, which 
we refer to as M-by-N topology, following the terminology of 
0,0. W.l.o.g., we present most of our discussion in terms of 
M = 2, i.e., inferring a 2-by-N topology; an M-by-N topology 
can be constructed by merging smaller structures. 

Similarly to 0, 0, we also assume that a predetermined 
routing policy maps each source-destination pair to a single 
route from the source to the destination. This implies the 
following three properties, first stated in 0. 
Al There is a unique path from each source to each receiver. 
A2 Two paths from the same source to different receivers 
take the same route until they branch, so that all 1-by- 
2 components have the "inverted Y" structure; the node 

'We would like to note that, although we present our scheme as an active 
scheme, in this paper, it can potentially be implemented as a passive scheme, 
if network coding operations are chosen to meet both network coding (e.g., 
independence w.h.p) and tomographic (maintain the properties described in 
Section [VII} goals at the same time. 



where the paths to the two receivers split is called a 

branching point, B. 

A3 Two paths from different sources to the same receiver 

use exactly the same set of links after they join, so that 

all 2-by-l components have the "Y" structure; the node 

where the paths from the two sources merge is called a 

joining point, J. 

These properties are consistent with the routing behavior 

in the Internet: the next hop taken by a packet is determined 

by a routing table lookup on the destination address. Each 

subnetwork from one source to the N receivers forms a 1-by- 

N tree; thus, the general graph is a "multiple-tree" network 

0. 

We are interested in inferring logical topologies, which 
are specified by the branching and joining points where the 
measured end-to-end paths meet. Intermediate nodes in a 
logical topology have degree at least three, and in-degree 
and out-degree at least one. Because this is necessary for 
identifiability, focusing on logical topologies is a standard 
assumption in topology inference problems. 

Delay and loss. Link delay has a fixed part, e.g., the 
propagation and transmission delay, and a random part, e.g., 
the queueing delay. Path delay is the sum of delays across 
the links in the path. In our simulations, we consider US-wide 
Internet topologies, with link delays up to tens of milliseconds 
(ms). We assume a coarse synchronization (i.e., on the order 
of 5-10ms) across network nodes, which is easily achievable 
via a handshaking scheme such as NTP. The rationale is to 
allow sources and intermediate nodes to operate in timeslots, 
of duration T and W, respectively, which are quite longer 
intervals than link delays. We also consider scenarios with 
and without packet loss. 

Problem Statement. Our goal, in this paper, is to design 
active probing schemes, i.e., the operation of sources, inter- 
mediate nodes and receivers, that will allows us to infer the 
logical topology from the observations at the receivers. 

We restrict the space of possible operations to the simple 
options described in the rest of this section. In later sections, 
we design schemes based on these simple operations and we 
show that they are sufficient for topology inference. We will 
revisit the problem statement and make it more precise in the 
sections for trees and DAGs. 

Operation of Sources. A pair of sources Si and S2 mul- 
ticast a packet each (x\ = [1,0] and X2 = [0, 1], respectively) 
to all N receivers. More generally, sources may send symbols 
from a finite field F q . Sources send up to count Max rounds of 
coordinated probes, which we call experiments. Experiments 
are spaced apart by a large interval T, to ensure that only 
probes in the same experiment are combined together. 

Operation of Intermediate nodes. Intermediate nodes are 
assumed to support unicast, multicast and the simplest possible 
network coding operation, namely addition over a finite field 
F q . They operate in time slots of pre-determined duration or 
window W: a node waits for W to receive probe packets 
from its incoming links; if it receives more than one probe 
packet, then it codes them together; and it forwards (unicast 
or multicast) the resulting probe downstream. The choice of 
W affects where the probes from the two sources meet. E.g., 



by choosing W to be larger than the maximum link delay, we 
can make sure that packets meet, in a hop-by-hop manner, and 
are coded together. 

Essentially, an intermediate node can act either as a joining 
point (J), in which case it adds multiple incoming probe 
packets and forwards the result downstream; or as a branching 
point (B), in which case it multicasts the single incoming 
packet downstream. This operation will be specified more 
precisely in the sections for trees and DAGs. 

Operation of Receivers. Each receiver i receives probes Ri, 
which are the source packets x\, £2, or a linear combination 
of X\ and X2, as the result of network coding operations at 
intermediate nodes. Inference of topology is based only on the 
observations Ri's. 

Intuition. Multicast as well as network coding (which, in 
this paper, is limited to simple addition, thus can be thought of 
as reverse multicast) introduce topology-dependent correlation 
in the content of packets. The content of the observations at 
the receivers can be used to reveal the underlying topology, 
and in particular, the branching and joining points. 

IV. Inferring Trees 

Overview. We design algorithms for inferring undirected 
tree topologies, based only on probes sent between leaf nodes. 
We follow a hierarchical, top-down approach, by iteratively 
dividing the tree topology into smaller clusters and revealing 
how the groups are connected to each other. 

Operation of Sources and Receivers. In each iteration 
(timeslot T » W) a set of leaves (different across timeslots) 
are chosen to act as sources and the remaining leaves act as 
receivers. Each source sends a distinct packet. The receiver 
stores the first packet it receives, and discards any subsequent 
packets (in the same iteration). 

Operation of Intermediate Nodes. Every intermediate 
node operates in intervals of duration W. If, within W, the 
node receives a single probe from one of its neighbors, it 
multicasts it to all other neighbors. If, within W, it receives 
more than one packet from different neighbors, it adds them 
and forwards the result to all remaining neighbors. In binary 
trees, this linear combination is simply XOR. In general trees, 
we need operations over higher fields. 

Summary of Results. In the rest of this section, we first 
consider binary trees, with or without packet loss. Then we 
extend our algorithms to m-ary trees. For trees without loss, 
we design deterministic algorithms that infer the topology in 
0(n) time. For trees with loss, just one successfully received 
probe per network path is sufficient, without the need to collect 
packet loss statistics, a property that enables rapid discovery 
of the underlying topology. 

A. Binary Trees 

1) Lossless Binary Tree: Let us first consider the simplest 
case: an undirected binary tree without packet loss. The 
following example illustrates the main idea. 

Example 1: Consider the tree shown in Fig. |l(a)| with 
7 leaves (1,2, ...7) and 5 intermediate nodes (A,B,C,D,E). 
Assume that nodes 1 and 7 act as sources S\ and S2 and 




(a) Undirected binary tree we want 
to infer. 




(b) Structure revealed after one it- 
eration. Leaves 1 and 7 act as 
sources. Probes meet at C. 




(c) Structure revealed after two it- 
erations. Leaves 5 and 6 act as 
sources. Probes meet at E. 



Fig. 1. Example[7J inferring the topology of an undirected binary tree with 7 leaves (1,2, ...7) and 5 intermediate nodes (A,B,C,D,E). 



Algorithm 1 Topology Inference for Lossless Tree 

Procedure (£1, £2, £3, Ai, A2, yl3)=SendTwoProbes(£): 

• Randomly choose two leaves in £ to act as sources Si, S2 an d sen( l 
probes xi, 22 respectively. All other leaves £ — {Si,S2} act as 
receivers. Intermediate nodes act as branching or joining points in trees. 

• When all receivers receive a probe, partition £ into £1 U £2 U £3 
as follows. Set £1 contains Si and all receivers that observe x\. £2 
contains S2 and all receivers that observe 22- £3 contains all receivers 
that observe X3 = xi © X2. 

• If £3 is not empty, replace the original graph with the three components 
£1, £2, £3, connected through three edges to node A; (as in Fig.|2ja)). 
If £3 is empty, replace the original graph with two components £1 and 
£2, connected through a single edge (as in Fig.[2lb)) and set £3 = 0. 

• return the components £i,£2,£3 and the nodes Ai,A2,A3 that 
connect each component to the network. 

InferBinaryTree: 

• Call (£i,£2, £3, Ai, A 2 , A 3 )=SendTwoProbes(£), where £ are all 
the leaves. 

• For each of the previously identified components £;, i = 1, 2, 3: 

- if £; contains one or two leaves, replace the component with either 
one or two edges, connecting the leaves through node Ai to the 
rest of the network. 

- if d contains three or more leaves, then call 
SendTwoProbes(A; U £;) to reveal its structure. Node 
Ai that connects d to the network acts as an aggregate receiver|j 

• Replace vertices of degree two with a single node. 



send probes x\ — [1, 0] and X2 = [0, 1], respectively. The rest 
of the leaves act as receivers. Intermediate node A receives X\ 
and forwards it to leaf 2 and to node C. Similarly, intermediate 
node D receives x^ and forwards it to node E (which in turn 
forwards it to leaves 5, 6) and to node C. 

Probes x\ and x-i arrive at node C. Node C adds them, 
creates the packet X3 = x\ © x-2 — [1,1], and forwards X3 to 
node B, which in turn forwards it to leaves 3, 4|3 

At the end, leaf 2 receives x\, leaves 5, 6 receive xi and 
leaves 3, 4 receive X3 = X\®Xi- Thus the leaves of the tree can 
be partitioned into three sets: L\ containing S\ and the leaves 
that received x\, i.e., C\ = {1,2}; £2 = {5,6,7} containing 
52 and the leaves that received X2\ and £3 = {3, 4} containing 

'We have chosen the directionality of the edges depending on which source 
reaches the intermediate node first. In this example, we assume that all links 
have the same delay. For different delays, £1,2:2 could meet at different 
nodes, but the algorithm will still work, as discussed after Lemma |4~71 

3 Although we cannot directly observe Ai, whatever is received by Ai will 
be received by the leaves that are in £ but not in d; thus acting as an 
"aggregate" receiver on their behalf. 





(a) Dividing £ into three (b) Dividing £ into two corn- 

components, ponents 

Fig. 2. Edges and vertices of the graph, as revealed by a single iteration 
(call of SendTwoProbes) in Alg. [7J The leaves £ are partitioned into two 
or three groups, based on their observations, x\, X2, x\ © X2 



the leaves that received x\ © x%. From this information 
observed at the edge of the network, we can deduce that 
the binary tree has the structure depicted in Fig. |l(b)| three 
components, each seeing a different probe (x\,Xi,X\ © X2) 
flowing through it, and connected through 3 links to the middle 
node C. This concludes the first experiment/iteration. 

To infer the structure that connects leaves {5,6,7} to 
node C, we need a second experiment. We randomly choose 
two of these three leaves, e.g., nodes 5 and 6, to act as 
sources Si and S^. Any probe packet leaving node D will 
be multicast to all the remaining leaves of the network, i.e., 
nodes {1,2,3,4} observe the same packet. One can think of 
node I? as a single "aggregate-receiver", which observes the 
common packet received at nodes {1,2,3,4}. Following the 
same procedure as before, assuming that x\ and X2 meet at 
node E, nodes 7 and {1, 2, 3, 4} receive packet X3 = x\ © X2. 
Using this additional information and the fact that the topology 
is a binary tree, we refine the inferred structure from Fig. |l(b)| 
to Fig. ^ U 

The algorithm for inferring any binary tree is shown in 
Alg. [TJ and generalizes the previous example. It starts by con- 
sidering all the leaves of the tree L. It calls SendTwoProbes 
and partitions the leaves into smaller sets/areas C\, £2, £3- 
It proceeds by recursively calling SendTwoProbes within 
each set, until all edges are revealed. 

Lemma 4.1: Alg. [TJ terminates in at most n iterations and 
exactly infers the topology of an undirected binary tree. 



Proof: Consider a particular iteration (call of 
SendTwoProbes): sources S\ and S2 send exactly 
one probe packet each to all other leaves. Now consider the 
intermediate nodes on the path V between the two sources. 
Depending on the link delays, there are two possibilities. 

The first possibility is that x\ and xi meet (arrive within 
the same W) at one of the internal nodes on V, e.g., node A. 
Node A forwards their XOR to its third link, and the iteration 
reveals the neighboring edges and nodes to A as depicted in 
Fig. |2ja). An alternative possibility is that packets x\ and 
X2 cross each other while traversing the same link of V in 
opposite directions, i.e., they do not meet at a node. Even if a 
leaf node receives more than one probe packets, we designed 
their operation so that they only keep the first one. In this case 
we infer the configuration in Fig. |2jb) that reveals one edge. 

In summary, the algorithm iteratively divides the binary 
tree into smaller components until one component has two 
or less leaves, in which case we know its structure. In each 
iteration, we reveal three edges or one edge. At the end, we 
have revealed all n — 1 edges. Therefore, the algorithm needs 



between 2J—I and n — 1 iterations. 



□ 



Notes. In each iteration, every link is traversed exactly once 
by a probe. Link delays affect where the probes meet and thus 
what components are revealed in each iteration. However, they 
do not affect the correctness of the algorithm. 

2) Lossy Binary Tree: Packet loss may causes confusion 
when dividing the receivers into components. One solution is 
to send multiple probes from the same two sources during 
each iteration as we discuss next. However, given packet 
loss and delay variability, this might result in probes meeting 
at different nodes in the same iteratiorQ. This problem is 
effectively created by the fact that we deal with undirected 
graphs, where a link may be traversed in opposite directions 
by probes sent in the same iteration. We can avoid this problem 
by fixing the directionality of the edges during each iteration. 
This can be achieved in a distributed manner by the first packet 
arriving at each intermediate node. In summary, we modify the 
operations of intermediate nodes as follows. 

Intermediate Node Operation: Each intermediate node 
keeps a table of its neighbors. In each iteration, it marks these 
neighbors as source or sink neighbors. Once this marking is 
done, it does not change for the duration of the iteration. The 
first time during an iteration that an intermediate node receives 
a probe packet, the node waits for a window W to receive 
probes from other neighbors. After this time W passes, the 
node marks all neighbors from which it received packets as 
sources and all other neighbors as sinks. For the remaining 
duration of the iteration, the node accepts packets only if they 
originate from its source neighbors. If an intermediate node 
receives a packet from one of its adjacent source neighbors, 
it forwards it to all its sink neighbors. If it receives more than 
one packet from two different source neighbors, it linearly 
combines them, and forwards the result to all sink neighbors. 
The node rejects probes coming from sinks, and does not 
forward packets towards sources. 

4 This was not a problem in the lossless case. In a given iteration, since 
only one probe packet is generated by each source, the packets at most meet 
at one intermediate node. 



Algorithm 2 Topology Inference for Lossy Binary Tree 

• If a receiver receives only x\, assign it to the set C\. 

• If a receiver receives only x%, assign it to the set £2- 

• If a receiver receives both x\ and X2, or it receives an x\ © X2 packet, 
assign it to set £3. 

• If a node does not receive anything, randomly assign it to one of the 
components. 

• For aggregate receiver nodes (Aj), apply the same rule using the union 
of the aggregate receiver observations. 



Alg. |2] presents the modifications required for Alg. Q]to be 
able to infer binary trees with lossy links. The only difference 
is that in each iteration, we send M instead of one probe 
packets from each source. Alg.|2]has an associated probability 
of error, since a leaf might not receive the "correct" probe 
packeO. Note that the number of packets M required to 
infer the topology within a desired error probability, is much 
smaller than the number of packets required by methods 
that collect a statistically significant number of packets and 
perform estimation. For our algorithm to operate correctly, it 
suffices that each node receives at least one probe packet from 
each of the sources it is connected to (nodes in L\ or £2 are 
connected to one source, S\ or S2 respectively, while nodes 
in £3 to two sources). 

B. M-ary Trees 

1) Full M-ary Trees: First let us consider full m-ary trees, 
i.e., trees with intermediate nodes that have degree exactly 
m + 1, m > 3, without packet loss. Alg. Q~]can still accurately 
infer the topology in less than n iterations. 

However, it is possible to modify the algorithm and infer 
the topology even faster. The idea is to keep the hierarchical 
clustering approach but increase the number of components 
revealed in each iteration, either (i) by changing the interme- 
diate nodes so that they forward different linear combinations 
of the incoming probes to different outgoing links; or (ii) by 
increasing the number of sources in each iteration. 

Modification I: (two sources per iteration, coding points 
send different combinations to different links). When an 
intermediate node receives two incoming packets from two 
different neighbors, it deterministically generates different 
linear combinations, e.g., x\ + X2, %i + 2x2,... and forwards 
each resulting packet to a different neighbor. Therefore, when 
x\ and X2 meet at any intermediate node, the leaves of the 
network will be divided into m + 1 components, depending 
on which probe they receive. If the probes do not meet at 
a node but cross each other, the leaves of the network will 
be divided into two components. Once a component has m 
or less leaves, since we have a full m-ary tree, we know its 
structure. Therefore, in each iteration, we reveal m + 1 edges 
or one edge, and the total number of iterations is reduced to 
at least ^-i and at most n — 1. Note that the probes need to 

m+l * 

be chosen from F m i , instead of F 2 2 . 

5 For example, in a given iteration, an error may occur either because a 
node does not receive any probe packet (which can be made arbitrarily small 
by increasing the number of probe packets M) or, because it belongs in £3 
but happens to receive only x\ or only X2 packets. This probability decreases 
very fast as M increases, as observed in the simulations of Section [VTI 



Modification II: (more than two sources per iteration, 
coding points send the same combination to all outgoing 
links). Alternatively, we can use up to m sources (as per 
Lemma I4.21 i per iteration. The probe packets are chosen 
from F-zm, and the sources send x\ = [1, 0,0, ..., 0], X2 = 
[0, 1,0, ...,0], ... 7 x m — [0,0,0, ..., 1], respectively. When an 
intermediate node receives k packets from different neighbors, 
within W, where 2 < k < m, it simply adds them up 
and forwards the result to the single remaining non-source 
neighbor. Depending on whether the node receives k packets 
or only a single packet, the leaves of the network will be 
divided into in + 1 or m more components; i.e., in each 
iteration, we reveal rn+1 or m edges. Therefore, the algorithm 
requires at least ^— ^j- and at most ^1 iterations. 

Lemma 4.2: The maximum number of sources that can be 
used to uniquely infer the topology of a full m-ary tree is m. 
Proof: We show that if we use m + 1 sources to infer 
the topology of a full m-ary tree, it cannot be uniquely 
identified. Consider a binary tree with three sources sending 
Xi = [1,0,0], x 2 — [0,1,0], and x 3 = [0,0,1], respectively, 
to all other leaves in the tree. Assume that the three probe 
packets meet at one intermediate node; thus, we divide the 
tree into four components, which observe xi, X2, Xz, and 
X\ + X2 + Xz = [1,1,1], respectively. Since the degree of 
intermediate nodes is three, we conclude that first, two of the 
three sources must have joined at one intermediate node, and 
then, their result must have joined with the third source in 
another intermediate node, so that they result in x\ +x 2 +x$ in 
the last component. The first two sources can be either x-y, x<i, 
or X2, Xz'. we cannot uniquely infer the underlying binary tree 
from observing these four components. The same discussion 
applies to larger full m-ary trees. ■ 

Note. In the presence of packet loss, we can apply the same 
argument as in Section IIV-A2I i.e., we can assign directions 
to the links during each iteration, so that our algorithms are 
applicable to the lossy case as well. 

2) General M-ary Trees: In general m-ary trees, i.e., with 
degree of intermediate nodes being from three up to a maxi- 
mum of ?7i + l, we can apply Alg. Q~]and infer the tree topology 
in 0(n) iterations. We can also apply Modification I, described 
in Section lTV-Bll The probes should be chosen from F m i since 
they may meet at an intermediate node of degree m + 1. Note 
that we cannot apply Modification II here: since probes may 
meet at an internal node of degree 3, we cannot use more 
than two sources, although there are nodes with degree up to 

777+1. 



2 sources meet 

only once 

(for all receivers) 




Fig. 4. Single-tree vs. multiple-tree topologies. Consider a single iteration. 
In a multiple-tree topology, unlike the single-tree topology, the observations 
at the receivers no longer uniquely identify the topology. 



the receiver nodes 5, 6, and 7. Therefore, we identify three 
components C\ = {1}, £2 = {2}, and £3 = {5,6,7}, 
together with the intermediate nodes A and D, and three 
edges 1A, 2A, and AD, that connect the three components 
together. However, we cannot obtain more information about 
the internal structure of the component £3 or any other part 
of the tree network. ■ 

Next, consider a 2-by-2 network as defined in Section [Till 
i.e., a directed acyclic graph (DAG) with two sources, two 
receivers and predetermined routing. We note that directed 
trees are only one type among all four types of the basic 
2-by-2 components of any multiple-tree network, as defined 
in Section [TTTJ There exist four 2-by-2 topologies, as shown 
in Fig. [3] which were first defined in 0, 0- Following the 
same terminology as in 0, 0, we refer to Fig. [3ja), (b), (c), 
and (d) as type 1, 2, 3, and 4, respectively. Type 1 was called 
"shared" in 0, 0, since the joining points for both receivers 
coincide ( Ji = J 2 ) and the branching points for both sources 
coincide (B\ = B^)- The other three types (2,3,4) are called 
"non-shared" since they have two distinct joining points and 
two distinct branching points. 

In a directed tree, all 2-by-2 components are of type 1. 
However, in a general M-by-N topology, several different 2- 
by-2 types may co-exist. The algorithms described so far 
can identify type 1 2-by-2 topologies, and thus, trees (either 
completely or partially, as described above). However, they 
cannot distinguish between type 1 and type 4 2-by-2's. 

Example 3: Consider Fig.[3](a) and (d). Assume that in both 
cases, we send x±,X2 from Si, £2 to i?i,i?2 and that £i,a;2 
meet (i.e., arrive within the same W) at any joining point. 
Therefore, in both type 1 and 4 topologies, both receivers 
observe x\ + X2, and we cannot distinguish between types. 



V. Inferring Directed Acyclic Graphs (DAGs) 
A. From a Single-Tree to Multiple-Tree Topologies 

So far, we considered undirected trees. Let us now consider 
directed trees, which are a special case of DAGs. 

Example 2: Assume that we assign directions to the links 
of the binary tree in Fig. |l(a)| all from the top to the bottom. 
Clearly, we can no longer send probe packets in arbitrary 
directions in each iteration. However, we can still infer some 
information about the topology. Assume that we send probes 
from the source nodes 1 and 2, and we observe x\ © X2 at 



In general, unlike single-tree networks, the observations do 
not uniquely characterize the underlying topology in multiple- 
tree networks. The reason is that once two sources in a tree 
network transmit their probe packets, they at most meet at one 
coding point for all the receivers, as we see in Section [IV] On 
the other hand, in a multiple-tree network, the probe packets 
may meet at different coding points for different receivers, as 
depicted in Fig. |U Therefore, we need a different approach. 

Problem Statement. Our goal in this section is to infer 
a multiple-tree topology, or an "M-by-N" topology according 
to the terminology of Section [HI] Similarly to 0, we take 




(a) type l:shared 




R, 1 
(b) type 2: non-shared 




(c) type 3: non-shared 




R, ) 
(d) type 4: non-shared 



Fig. 3. The four possible types of a 2-by-2 subnetwork component, as defined in |2j. There are two sources (Si, £2) multicasting packets 
X\ , X2 to two receivers (Ri , J^)- (The l-by-2 topology of Si is a tree composed of Si , £?i , Ri , i?2- Similarly, The l-by-2 tree rooted at S2 
is S2,B2,R\,R2- Ji and J2 are the joining points, where the paths from S2 to Ri and R2, join/merge with Si's topology.) 



two steps. In the first step (Section IV-Bb . we use several 
experiments and we exactly identify the type of every 2-by- 
2 component. In the second step (Section IV-Cb , we merge 
these 2-by-2 subnetwork components to reconstruct the M- 
by-N network. 

Operation of Sources. Pairs of sources are selected and 
send up to countMax coordinated multicast packets to all 
receivers. As in the general setup, probes are spaced apart by 
intervals of T. In addition, we introduce a difference in the 
sending time of the two sources, which we call the offset it. 
W.l.o.g., let Si send first and S% second. 

The timing parameters T, u, W are coarsely tuned so as 
to create observations that can distinguish among the 2-by- 
2 types. In particular (i) T » W ensures that only probes 
within the same experiment are coded together; (ii) W >> 
link delay ensures source packets meet at the joining points 
despite link delays; (iii) it is selected randomly in each 
iteration, so that it forces probes to meet at different points, or 
not meet at all, in different iterations. Finally, coarse selection 
of T, W with rough estimates of upper bounds on link and 
path delays is sufficient. 

Operation of Receivers. For a given 2-by-2 subnetwork, let 
the observations at the two receivers be Ri = CnXi + C12X2, 
R2 = C21X1 + 022X2- Based on these observations, we design 
Inference algorithms that identify the 2-by-2 type (in Section 
IV-Bb and Merging algorithms that build the M-by-N from the 
2-by-2's (in Section [V-Cl . 

Operation of Intermediate Nodes. In DAGs, the operation 
of an intermediate node, depending on whether it acts as a 
joining or a branching point, is summarized in Alg. [3] and 
Alg. |U respectively. A joining point (J) adds and forwards 
packets, while a branching point (B) forwards the single 
received packet to all "interested" links downstream. A link 
is "interested" in the routing sense if it is the next hop for at 
least one source packet in the network coded packet. 

B. Identifying 2-by-2 Components 

In this section, we propose an approach for exactly identify- 
ing the type of a 2-by-2 component, using the same intuition 
as as in trees, i.e., coding operations result in observations that 
can uniquely characterize the underlying 2-by-2. Our approach 
builds on J2] and improves over it by uniquely distinguishing 



Algorithm 3 Operation at Joining Point J, in DAGs. When 
two sources multicast to N receivers, J knows that it has two 
incoming links and one outgoing link. Additions are over F q . 

1: for every time window W do 

2: if (J receives 2 packets within W from its incomings) then 

3: as soon as the last one arrives, it adds them up, and forwards the 

resulting packet downstream 
4: else if (J receives only one packet within W) then 
5: it forwards the packet downstream 

6: else if (J does not receive any packet within W) then 
7: /*nothing to do*/ 

8: end if 

9: end for 

Algorithm 4 Operation at Branching Point B, in DAGs. 

While two sources multicast to N receivers, B has one 

incoming packet and multiple outgoing links. 

1: 
2 
3 
4 
5 
6 
7 
8 



for each incoming packet do 

if the incoming packet is x\ (or X2) then 

forward it only on the outgoing links that are next hops for Si (52) 
else 

/* The incoming packet is of the form ax\ + bx2- */ 
forward the packet to all outgoing links 
end if 
end for 



among all four 2-by-2 types, while 121 was able to distinguish 
among shared (type 1) and non-shared (types 2,3,4) only. 

1) Lossless 2-by-2: First, we provide an algorithm for 
identifying the type of a 2-by-2 component without loss. In 
the first experiment, sources Si , S2 multicast probe packets 
Xi,x 2 to Ri,R 2 . We begin with the assumption that Si,S2 
act simultaneously, or in practice within the synchronization 
offset. A choice of large W guarantees that x\ and X2 meet 
at both joining points J\ , J2 that add the incoming packets 
over F3. Depending on the underlying 2-by-2 type, i?i,i?2 
will observe one of the following pairs: 



type 1: 


Ri 


Xl +X 2 , R-2'- Xi +x 2 


type 2: 


Ri 


xi + x 2 , R%: xi + 2x2 


type 3: 


Ri 


Xl + 2X2 , Ri'- Xi + X 2 


type 4: 


Ri 


xi + X2 , R%: Xi + X 2 



Types 2 and 3 result in unique observations that make them 
distinguishable from any other type; i.e., one such observation 
suffices to identify type 2 or 3. However, types 1 and 4 result 
in the same pair of observations; therefore, we need to design 
different experiments to get observations that can uniquely 



TABLE I 

Lossless Case. Possible observations for types 1 and 4 2-BY-2 

topologies. (Observation #1 occurs when the sources are 

synchronized. Observations #2-4 occur when 52 sends after Si, 

by an offset u € [/ ■ w, w\.) 



Observation 
Number 


Type (1) 


Type (4) 
R-i R-2 


1 


Xl + X2 


Xl + X 2 


Xl +X 2 


Xl + x 2 


2 


XI 


XI 


XI 


XI 


3 






Xl +X2 


Xl 


4 






XI 


Xl + x 2 



Algorithm 5 Lossless Case - Inferring a 2-by-2 com- 
ponent. Sources S\,S2 multicast xi,X2- Receivers observe 
Ri = c\\x\ + C12X2 and i? 2 = C21X1 + 022X2- 



n=l; /*first experiment*/ 
if c 2 2 > C12 then 
Output type 2. 
else if C22 < C12 then 

Output type 3. 
else 

/*It is Rx = R 2 *l 

while n < countMax & Ri == R2 do 

Draw offset u uniformly at random out of [/ ■ W, W] . 
Send probes; S2 transmits u time later than Si. 
if Ri ^ R 2 then 

Output type 4; Exit; 
end if 
n++; 
end while 

Output type 1; /* It is n == countMax*/ 
end if 



characterize type 1 or type 4. 

In the next experiment, we exploit the observation, first 
made in [2|, that type 1 is the only 2-by-2 where the two 
joining points coincide (Ji = J2 — J). Therefore, the 
observations at the two receivers are always the same: either 
xi + X2 when the two packets meet at J; or a single packet 
(xi or X2) when the two packets do not meet at J. In contrast, 
type 4 has two different joining points J\ 7^ J 2- If we force 
packets to meet only at one of the joining points but not at the 
other one, the receivers will have different observations. These 
are observations #3 and #4 in Table [D and they can uniquely 
characterize type 4. 

These observations can be achieved by appropriately select- 
ing the offset u in the sources' sending times, i.e., u needs to 
be large enough so that after addition to the link delays, it can 
affect W: if Di,Z?2 are the delays on the paths from So to 
J\ , J2 respectively, then u must be in [W — D\ , W — -D2J3 

Alg. summarizes the experiments we perform in order to 
infer the type of a 2-by-2 network. Types 2, 3 are identified in 
the first observation. Type 4 is identified the first time that the 
two receivers see different observations. If after countMax 
trials, we still have not seen any different observations at the 
two receivers, then we declare the 2-by-2 to be of type 1 . 

Choosing countMax. The larger the number of experi- 
ments, countMax, the smaller the error probability. Define 

6 In 2-by-2 components, this interval is close to W since Di , Di are small 
compared to W. In more general 2-by-N networks that we consider for our 
simulations, there exist multiple links between the sources and joining points, 
link delays are on the order of tens of ms, and W is in on the order of 
hundreds of ms. Therefore, we safely can choose u £ [/ • W, W] in the 
general case, where < / < 1 is a tunable parameter. We choose / = ^ in 
our simulations, to force different observations at the two receivers. 



X = I{R\ = R2} to indicate whether the two receivers 
see the same observation. A" is a Bernoulli random variable 
with probability p = Pr{Ri = R 2 } E [0, 1]. The number 
of experiments we need to distinguish between types 1 and 
4 is a geometric random variable, which stops the first time 
that we get different observations R\ 7^ i?2- If the 2-by- 
2 is of type 1, then X = 1 always, p = 1, and Alg. [5] 
cannot do a mistake. The only possible error is to mistakenly 
declare a type 4 topology as type 1. Assume that the two 
receivers had similar observations in countMax trials, i.e., 
(Xi,X 2 ,...,X countMax) = (1, 1, ..., 1), Alg. \5\ infers the 2- 
by-2 as type 1 . Let a% be the probability of correct decision 
after countMax experiments: 

Pr(type = 1(1, 1, ..., 1) = T—frTTi ^T7 M = a% 

1 + rr(l, 1, ..., l\type = 4) 

(1) 

Where we assume that the underlying topology is 

equally likely to be type 1 or type 4. We have that: 

Pr(l,l,...,l\type = 4) = [Pr(X = l\type = 4)} countMax . 

Also recall that the offset u is drawn uniformly at random 

from [/ • W,W] (as described in footnote [6]). Therefore, 

Pr(X = 1 1 type = 4) = Pr{u A 



[W-Di,W-D 2 ]) = l-- 



\D\ — D2I can be as small as 0, therefore, 1 — KlTwy 
(the probability of having similar observations in type 4) 
can be close to 1. For Eq. ([Hi to hold for a — 99%, we 
need countMax = 458. This is a pessimistic upper bound: 
simulations in Section [VT] show that countMax is much 
smaller in practice. 

2) Lossy 2-by-2: Let us now consider a 2-by-2 network 
where packets may be lost on some links. In this case, we 
can no longer guarantee meetings of X\ and X2 at the joining 
points and predictable observations at the receivers. There 
are two differences from the lossless case. First, because of 
random packet loss, each experiment might result in different 
outcomes, shown in Table [II] Second, there are common 
observations across all four types, as opposed to just between 
types 1 and 4. We divide the observations in Table [II] into 
three groups: (i) at least one of the receivers does not receive 
any packet ("-") due to loss, (ii) both receivers have the same 
observation i?i = R2, and (iii) the two receivers have different 
observations R\ 7^ i?2- 

We choose to ignore the observations of group (i) because 
they can occur in any of the four 2-by-2 types and thus do 
not help to distinguish among them in the deterministic way 
adopted in this paper. Observations of group (ii) can also be 
the result of any 2-by-2 type: unlike the lossless case, where 
Ri = i?2 is unique to type 1 or 4 topologies, any of the four 
topologies may result in such observations if some packets 
are lost. We observe that group (ii) are the only possibility for 
type 1 topology, apart from the group (i) that we ignore, while 
all other three 2-by-2 types may result in either R\ = R2 or 
i?i ^ R2. Therefore, if after countMax trials, we have only 
observations from group (ii), we declare the topology to be 
type 1. 

In observations of group (iii), it is Ri ^ R2, which means 
that c\2 ^ C22 and/or en 7^ C21. An important observation is 
that the difference of the coefficients between the two receivers 



ir 



TABLE II 

Lossy Case. Possible observations for all four types of 2-by-2 topologies. (Sources send synchronized and W is large. 

Observation #13 for types 2 and 3 occurs only when & sends with offsets e [/• W, W] after Si.) We divide the observations into 

THREE GROUPS : (I) AT LEAST ONE RECEIVER DOES NOT RECEIVE ANY PACKET (II) Ri = R2 (III) Rl ^ Rl- 



Obs. 

# 


Obs. 
Group 


Ty F 
Ri 


e 1 

R 2 


Obs. 
Group 


Type 2 
Ri R 2 


Obs. 
Group 


Type 3 
Ri R 2 


Obs. 
Group 


Type 4 
Ri R 2 


1 


(i) 


- 


- 


(i) 


- 


- 


(i) 


- 


- 


(i) 


- 


- 


2 




- 


Xl + X 2 




- 


xi + 2x 2 




xi + 2x2 


- 




- 


Xl + X 2 


3 




- 


XI 




- 


X'l + X2 




Xl + X2 


- 




- 


Xl 


4 




- 


X2 




- 


Xl 




xi 


- 




- 


X2 


5 




X 1 + X 2 


- 




- 


X2 




X2 


- 




Xl + X2 


- 


6 




xx 


- 




Xl +x 2 


- 




- 


Xl +X 2 




Xl 


- 


7 




X2 


- 




x 1 


- 




- 


Xl 




X2 


- 


8 


(ii) 


Xl +X 2 


Xl + X2 




%2 


- 




- 


%2 


(ii) 


Xl +X2 


Xl + X 2 


9 




XI 


XI 


(ii) 


Xl + X2 


Xl + X2 


(ii) 


XI + X2 


Xl + X2 




xi 


Xl 


10 




%2 


%2 




Xx 


Xl 




Xl 


Xl 




%2 


X2 


11 










X2 


X2 




X2 


X2 


(iii) 


Xl 


Xl + X 2 


12 








(iii) 


Xl + x 2 


Xl + 2x 2 


(iii) 


Xi + 2x 2 


Xl +x 2 




Xl +x 2 


Xl 


13 










Xl 


Xl +x 2 




Xl +x 2 


Xl 




Xl 


x 2 


14 










Xl 


X 2 




X 2 


Xl 




X 2 


Xl 


15 










Xl +x 2 


X 2 




X 2 


Xl +x 2 




Xl +x 2 


x 2 


16 






















X2 


Xl +X 2 



contains topology-related information. W.l.o.g, we focus on 
the coefficient of X2 and look at the difference c\2 — 022- 
Table [II] shows that C12 — C22 < can only occur in type 
2 or type 4 topologies; while c 12 — c 2 2 > can only occur 
in a type 3 or 4 topology. Note that the coefficient is larger 
on one side (e.g., c 12 > c 22 ) when the probe (x 2 ) goes 
through two joining points on its way to one receiver (in 
this case, i?i) and through one joining point on its way to 
the other receiver (i? 2 ). By performing several independent 
experiments and collecting several observations of group (iii), 
we can distinguish among the candidate topologies. If after 
countMax experiments, there are only observations of group 
(ii) or (iii) with ci 2 — c 22 < 0, we declare the topology as 
type 2. If there are only observations of group (ii) or (iii) with 
c-12 — c 22 > 0, we declare it as type 3. If there are observations 
of group (ii) or (iii) with both ci 2 — c 22 < and ci 2 — c 22 > 0, 
we declare it as type 4. 

In our experiments, we try to create those observations 
that reveal the topology. These can occur either naturally, 
as the result of packet loss, or artificially, by us introducing 
an offset u in SVs sending time with respect to Si. To 
help these observations occur, especially for small loss rate, 
and similarly to the lossless case, we use a random offset 
u 6 [/ • W, W\. To make these experiments independent, we 
space apart successive sets of probes by roughly selecting 
T > 3W, which is sufficient since there are at most two 
joining points on any (Si,Rj) path in a 2-by-2. 

Alg. [6] summarizes the 2-by-2 inference for lossy networks. 
The algorithm is simple and follows a deterministic approach: 
one observation, or a set of observations, is sufficient to 
uniquely distinguish among types. For example, at least one 
observation of group (iii) rules out the type (1) topology; a 
pair of group (iii) observations with both C12 — c 22 > and 
C12 — c 22 < indicates type 4; etc. As a result, we require 
less experiments compared to thousands of arrival order mea- 
surements required by (2), El for statistical significance. In 
addition and more importantly, we identify the exact 2-by-2 



Algorithm 6 Lossy Case - Inferring a 2-by-2 component. 

Sources Si, 5*2 multicast xi,x 2 . Receivers observe Ri = 
C11X1 + C12X2 and _R 2 = 02,1X1 + 022^2- The variable type 
stores our estimate of the type of the 2-by-2 compoenent and 
it gets updated during the experiments. 

~T 

2 
3 
4 

5 

6 

7 



n = 1; /* first experiment*/ 
type=0; /*initialization*/ 
while n < countMax do 

if Ri ^ [0, 0] & R 2 ^ [0, 0] then 
if C22 > C12 then 
if type 5^ 3 then 

type=2; 
else 

type=4; Break; 
end if 
else if C22 < C12 then 
if type ^ 2 then 

type=3; 
else 

type=4; Break; 
end if 
else if type == & Ri == R2 then 

type=l; 
end if 
end if 
n++; 

Draw offset u uniformly at random out of [/ ■ W, W] . 
Send probes; 52 transmits u time later than Si. 
end while 
Output type. 



type while J2] only distinguishes between shared and non- 
shared. 

3) Inferring all 2-by-2 's in a 2-by-N Network: Algorithms 
[3] and [6] can be directly applied to a 2-by-N network, i.e., a 
network where two sources multicast to TV receivers. A differ- 
ence is that intermediate nodes need to perform addition over 
a larger finite field (of order larger than the maximum number 
of joining points on a path). Algorithm [5] and Algorithm|6]can 
be performed on any pair of receivers among all ( 2 ) possible 
pairs. The same set of 2-by-N probes can be used to infer, in 
parallel and independently, the type of all 2-by-2 topologies. 
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This reduces the number of probes, as we re-use them, instead 
of sending ( 2 ) different sets of probes. The 2-by-N structure 
is important for the merging algorithm in Section IV-CI 

4) 2-by-2's vs. other Subnetwork Components: We now 
discuss why we choose to decompose an M-by-N network 
into 2-by-2 subnetwork components, as opposed to any other 
subnetwork structures, e.g., m — by — n's, where 1 < m < 
M, 1 < n < N. 

• 1-by-l: This is the smallest component and corresponds 
to measuring a single end-to-end path. However, it does 
not reveal neither joining nor branching points. 

• l-by-2 and 2-by-l: These correspond to a 2-leaf multi- 
cast or a reverse-multicast tree, respectively. The 2-by-l 
consists of 2 sources, one coding point, 1 receiver. The 
2-by-l cannot identify the branching points while the 1- 
by-2 cannot identify the joining points. Similar comments 
apply to M-by-1 and 1-by-N. 

• 2-by-2: This is the smallest structure that gives informa- 
tion about the relative locations of joining and branching 
points. 

• m-by-n, with 2 < m < M, 2 < n < N : If we consider 
larger structures, there is an exponentially larger number 
of possible types, which requires more complicated infer- 
ence algorithms. E.g., there exist 19 possible types for a 
2-by-3. 

• M-by-N: In the extreme case, we need to enumerate all 
possible M-by-N topologies, as in [25]. 

The larger the subnetwork component we use as a building 
block, the less components we need to infer and the simpler 
the merging algorithm. However, as the size of the basic 
component grows, the number of possible types increases 
exponentially and the inference step becomes increasingly 
complex. In this paper, we choose to decompose an M-by-N 
into 2-by-2 components, inspired by the approach in 0- We 
note that 2-by-2 is the minimum size building block required 
to infer both joining and branching points and strikes a good 
tradeoff of inference vs. merging complexity. 

C. Merging Algorithm 

Assuming knowledge of all 2-by-2 subnetwork components, 
from Section [V-B| we now merge them together to reconstruct 
the M-by-N network. We study merging in two different 
scenarios: (i) when a 1-by-N tree topology is known, which is 
the same problem studied in J2|; and (ii) without knowledge 
of any 1-by-N, which is new to our work. Exploiting the 
accurately identified 2-by-2's, we can solve (i) exactly, which 
was previously only approximately solved; and also solve (ii), 
which was previously not known how to address. 

More precisely, our merging algorithm can identify every 
joining point, in the sense that it can localize it between 
two branching points. However, note that when there are 
several joining points in a row without any branching points 
in between, it is not possible to identify the relative locations 
of these joining points with respect to each other. In fact, this 
is the case in a tree topology. 

1) Merging a 1-by-N and 2-by-2's into a 2-by-N: In this 
section, we assume that the 1-by-N from Si to TV receivers 



Algorithm 7 Merging Algorithm: Given the two sources Si 
and S2, a set of receivers Ri, R2, ..., Rn, the 1-by-N Si tree 
topology, and the 2-by-2 results from Algorithm [6] for any 
pair of receivers Ri, Rj, this algorithm identifies a single link 
for the location of every Ji (the joining point for Ri), on Si 
topology. 

1: for each receiver Ri do 

2: if 3k < i such that the Si, S2, Rk,Ri 2-by-2 is shared then 

3: Ji = Jh, 

4: else 

5: Let B be the closest branching point to Ri 

6: while Ji is not localized to a single link do 

7: Let R 3 be any child of B (j ^ i) 

8: Based on the type of the 2-by-2 component Si, S2, Ri, Rj, 

locate Ji above/below B 
9: if (Ji is below B) \ \ ((Ji is above B) &;& (J other branching 

point above B on Si's 1-by-N)) then 



Ji is localized to a single link. 
Output this link; Break; 
else 

B = the next upstream branching point 
end if 
end while 
end if 
end for 



is known, using any of the classic methods for single-tree 
topology inference, e.g., see fU, or our algorithms in Sec- 
tion [IV] for tree networks. This 1-by-N is a tree rooted at Si 
and contains only branching points. We also assume that the 2- 
by-2's between Si, a new source S2, and any pair of receivers 
are known, using the algorithms of Section IV-BI Our goal is 
to locate the joining points where paths from S2 to the same 
N receivers join Si's topology. We use the assumptions of 
Section [Til] for routing. 

This problem was posed in 13, l40l and solved there in an 
approximate way. Bounds on the joining points locations in the 
Si topology were provided within a sequence of consecutive 
logical links. This was a result of the fact that 2-by-2's are 
only identified as shared or non-shared types in J2), JJJ. 

In contrast, we design Algorithm [7] which localizes each 
joining point for each receiver to a single logical link, between 
two branching points in the Si topology. Our algorithm is 
simpler, faster, and more accurate: it can identify all joining 
points for any topology and with lower complexity, thanks to 
our complete knowledge of the 2-by-2 types. 

Example 4: Fig. [(J a) depicts a 2-by-9 topology constructed 
based on the Abilene network BTI . Consider Ri: it forms a 
type 1 2-by-2 with i?2- Thus Ji must lie above Bi^, so that 
there exists a unique path from each source to R\. But then 
#1,3 is on the way. Ri,R% form a 2-by-2 of type 4, thus 3\ 
must be below B13. Now Ji is localized to one link and the 
algorithm ends here for R\. Other receivers are considered 
similarly. Note that a joining point can be placed on any link 
from the receiver to Si, thus, the number of steps required to 
localize a joining point is at most the height of the Si tree. 
Also, when there is a group of receivers within which all pairs 
are of type 1, the algorithm is run once and assigns the same 
joining point to all of them. For this example, the algorithm in 
12 cannot completely resolve all joining points and provides 
bounds within a sequence of several logical links instead. ■ 



11 



2) Merging 2-by-2's into a 2-by-N: In this section, we infer 
a 2-by-N without prior knowledge of any 1-by-N. Inference 
under this relaxed assumption is enabled by our exact knowl- 
edge of 2-by-2's and was not possible before J2], PTfl . We 
first send probes over the 2-by-N and then we merge all ( 2 ) 
2-by-2 components, as described next. 

Example 5: We first consider all shared (type 1) 2-by-2 
components and assign them the minimum number of branch- 
ing and joining points required. For example in Fig. |6(a)| 
Bi y 2, B 3y 4 and J\ = J 2 , J3 = J± = J5 = -h = J? = Js = -h 
are identified in this step. Second, we consider all non-shared 
2-by-2 topologies (of type 2, 3, or 4). We use the information 
about the locations of the branching and joining points in each 
type to: (1) add the minimum number of branching points 
required to the ones already identified from the shared pairs; 
and (2) assign joining points to those receivers that have not 
been already assigned one. In the example of Fig. |6(a)| an 
additional branching point B\ t z is required, which is connected 
to both joining points J\ = J 2 and J 3 = J 4 = J 5 = J 6 = 
J-i = Js = Jq, to satisfy the 2-by-2's of type 4 between the 
two shared groups. No additional joining point is required in 
this example. ■ 

This approach identifies the locations of all joining points, 
between the Si and S 2 1-by-N topologies, but does not 
identify all the branching points in the Si tree topology. Only 
the "minimum" Si topology is identified, i.e., the tree made 
by the "necessary" branching points. We define as "necessary" 
branching points the ones located below a joining point of Si 
and S 2 in the 2-by-N. An "unnecessary" branching point is 
the child of another branching point with no joining point 
in between. This approach does not identify #4,5, Bqj, Bq$, 
and directly connects their children (R4,R^,RQ,RT,Rs,Rg) 
to the upstream branching point (B34). 

Note that the worst case input for this approach is a tree 
network. Since all 2-by-2's are of type 1, and the algorithm 
cannot reconstruct branching points in a row, it can only iden- 
tify the top-most branching point of the entire tree structure. 

3) From 2-by-N to M-by-N: We can directly extend the 2- 
by-N inference techniques to the M-by-N case iffOl . We start 
from a 2-by-N topology, and add one source at a time, to 
connect the 1-by-N's of the remaining M — 2 sources. Assume 
that we have constructed a k-by-N topology, 2 < k < M. To 
add the (k + l) th source, we perform k experiments, where at 
each experiment one different of the k sources and the (k + 
l) th source send xi and x 2 . We then "glue" these topologies 
together by following the topological rules of Section IV-C1I 

4) Complexity of Merging: 

Lemma 5.1: Identifying at least N — 1 2-by-2 components 
is necessary for Alg.[7jto be able to identify all joining points. 
Proof: The main idea is provided in the following exam- 
ple, which is chosen to demonstrate the worst case. Alg. [7j 
requires the maximum number of 2-by-2's to localize all the 
joining points in this example topology. 

Consider Fig.|5J Algorithm [7j starts from Ri, and considers 
the 2-by-2 type of Ri , R 2 first. Assume that Ri , R 2 form a 
2-by-2 of type 1, and thus, their joining point lies above Bi y2 . 
Therefore, we need to continue to the upper branching points. 
In the next step, we check the 2-by-2 type of Ri with one 




Fig. 5. Counting the 2-by-2's required for merging algorithm [7] to uniquely 
localize all joining points. 

child of Bi t i, e.g., Ri. Assume that the joining point for Ri 
still lies above Bi.f, thus, we need to continue until the highest 
branching point and consider Ri with one child of Bi^, i.e., 
Rn- The joining point for Ri (also R 2 ,Rn) is finally localized 
above Si^r. We now need to find the joining point location 
for any other receiver Ri,Ri + i, ...,Rk- Each Ri needs to be 
considered with one child of any of its upper branching points 
until its joining point is identified, which can be up to Bij at 
most; because otherwise it would be shared with Ri. Assume 
that in the worst case, Ri needs to be considered with all of 
them, i.e., RiRi + i, ..., RiRk, and its joining point lies directly 
below Bii. For any of the remaining receivers, we need to 
continue in the same way; their joining points will lie below 
their branching point with R4, because otherwise, they would 
be the same as J^. 

The following matrix shows the list of all 2-by-2's that are 
required to identify all joining points. It is an N x N upper- 
triangular matrix, where each row or column is corresponding 
to one of the N receivers. The non-zero elements may rep- 
resent any of the four 2-by-2 types. However, we have only 
indicated the ones we require (with yes) and the ones we do 
not require (with no). 



I yes no 
no 







no yes no 
no no 

yes 



no yes \ 
no 



yes no 



V 



Intuitively, the list of the 2-by-2's required to identify the 
joining point for each receiver is separate from any other 
receiver. Therefore, the total number of required 2-by-2's is no 
more than the length of the longest row in the matrix above 
{i.e., the first row), which is N — 1, ■ 

Note on Lemma 15.71 If the 2-by-2's are properly selected, 
N — 1 is sufficient. Unfortunately, we can not know in advance 
(i.e., without knowledge of the 2-by-N topology) which 2- 
by-2's to choose out of all ( 2 ) possible 2-by-2's, so as to 
uniquely localize the joining points between two branching 
points. Nevertheless, from the given Si 1-by-N, we can give 
an upper bound on the number of 2-by-2 types required. Since 
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(a) The Abilene topology 1411 . (b) An Erdos-Renyi random graph. 

Fig. 6. Three different topologies used to test our inference algorithms in simulation. 



(c) A preferential attachment graph. 



every receiver needs to be checked with other receivers that 
are children of its upper branching points, up to the location 
of its joining point, we need to check for 0(N log N) 2-by- 
2's. This is less than identifying all ( 2 ) 2-by-2's. Note that 
we still need to multicast x±,X2 to all receivers and monitor 
all observations; but we can use only the observations of the 
selected 2-by-2's for inference and ignore the rest. 

Lemma 5.2: The complexity of Algorithm [7j is O(N). 
Proof: The algorithm starts from a receiver and proceeds 
to its upper branching points until its joining point is localized 
below one of the branching points or above the highest one. At 
each step, the algorithm uses information from one 2-by-2 to 
improve its estimate of the joining point location. Therefore, 
the total number of steps performed by the algorithm equals 
the number of required 2-by-2's given in the matrix above. 
There are also some repeated checks of the same 2-by-2's 
for those receivers that are shared with previously considered 
receivers. However, the complexity of the algorithm is still in 
the order of O(N). ■ 

Finally, one can see that the second merging algorithm needs 
to know all ( 2 ) 2-by-2's, thus also takes 



. 2 . 



steps. 



VI. Simulation Results 

We now simulate our InferenceO algorithms in some repre- 
sentative topologies that exemplify different characteristics. 

A. Trees 

1) Simulation Setup: Consider the binary tree example of 
Fig. |l(a)| Assume that all links have the same loss probability 
p £ [0, 10%]. We simulate Alg. |2] and we send up to M = 
10 probes per iteration. We conservatively consider an error 
to be any divergence from the true topology. The results are 
averaged over 10, 000 realizations of the loss process. 

2) Simulation Results: Fig. [7j shows the percentage of 
inference errors in each of the first two iterations (shown in 
Fig. |l(b)| and Fig. |l(c)| > as a function of p and M. As expected, 
the probability of error is increasing with p, since packet losses 
may lead to the misclassification of a leaf to the incorrect 

7 We note that, in both our approach and in past work (2), 0, the error 
in identifying the 2-by-2's, in the first step, may propagate to the Merging 
algorithm, in the next step. However, there is no additional error introduced 
by the Merging algorithm itself, and thus no need to simulate it. 




(a) Iteration 
Fig.fUbll 



1, which infers the topology in 




(b) Iteration 
Fig-PT^l 



2, which infers the topology in 



Fig. 7. Probability of incorrect inference for the binary tree of 
Fig. |l(a)[ as a function of the loss probability p (same for all links) 
and of the number of probes M per iteration. 
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Fig. 8. Lossless case. The error probability vs. the number of experiments 
for the three topologies in Fig. [6] 1000 realizations. 
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(a) Simulation results for the Abilene topology in (b) Simulation results for the random topology (c) Simulation results for the preferential topology 
Fig. |Ua). in Fig. |SJb). in Fig. 0c). 

Fig. 9. Lossy case. The error probability vs. the loss rate for different values of countMax for the three topologies in Fig. [6] 10,000 realizations. 



component. For a fixed number of probes per iteration and 
fixed loss rate p, the error probability decreases with the 
iterations. Also the probability of error decreases rapidly with 
the number of probes M per iteration: it decreases significantly 
even with M = 2, 3, even for large p, and becomes practically 
zero for M > 50 

B. Multiple-Tree Topologies (DAGs) 

We simulate our algorithms in example multiple-tree net- 
works. In summary, we show that (i) our approach significantly 
improves over J2], J3|, ll40l . in terms of the number of 
experiments required to identify the type of all 2-by-2's as well 
as of the associated probability of error; (ii) the probability 
of error in identifying the 2-by-2's depends on the underlying 
topology. In particular, it is smaller for preferential attachment 
graphs as compared to ER random graphs. 

1) Simulation Setup: To demonstrate (i), we consider 
Fig. |6ja), which shows the Abilene topology |41], with two 
sources located at the Chicago and Indiana nodes, and nine 
receivers, each located at one of the other core network nodes. 
This is the same topology considered in (2|- To investigate 
(ii), we consider Fig. |6jb) and Fig. HJc). Fig. |6jb) shows a 
random topology with 2 sources and 7 receivers generated by 
LED A [42|. Fig.[6jc) shows a preferential attachment topology 
generated by Brite II431 . We pick 2 sources and 8 receivers and 
we select the route for every source-destination pair, according 
to our assumptions in Section ITUl 

We run Alg. [5] and Alg. [6] in the absence and presence of 
packet loss, respectively, and we compute the error. In the 
lossless cases, we identify the 2-by-2 types and report the 
error as a function of the number of experiments countMax. 
The only possible error is to falsely declare a type 4 as type 1 . 
In the lossy case, we also report the error assuming that there 
is packet loss in the network (with prob. p independently on 
every link), and after applying Algorithm [6] to each topology. 
An error in this case can result either from declaring type 

8 This second observation is due to the fact that one correctly received 
packet is sufficient for the correct operation of Alg. [2] E.g., if a node receives 
a mixture of x\ and X2 probes, it will be correctly assigned to component £3 
even if some probes are lost. In contrast, methods that require each receiver to 
receive enough probe packets to infer the probability of loss rate associated 
with the network links with a certain accuracy, require a larger number of 
probes for statistical significance. 



2 or 3 or 4 as type 1; or from declaring type 4 as type 2 
or 3. We consider values of p 6 [0, 25%] and countMax = 
100,200,250. 

We assume that individual link delays have a fixed part 
of 5-10ms (propagation delay), and a variable part, which is 
exponential with a maximum of 10ms (queueing delay). We 
choose a large time window W = 100ms. The offset u is 
drawn uniformly at random out from [50, 100]ms, i.e., f = i. 

2) Simulation Results: Fig. [8] reports the results for the 
lossless case, and Fig. [9] for the lossy case, for all three 
topologies of Fig. [6] 

Let us first discuss the Abilene topology, shown in Fig.[6{a). 
In the lossless case, the Abilene curve in Fig. [S] shows that 
the error probability decreases very rapidly with countmax 
and reaches at countMax ~ 150. In the lossy case, 
as shown in Fig. |9ja), the error probability also decreases 
rapidly with countMax: it becomes negligible with 200 - 250 
experiments. This is a significant improvement over |2] for 
the same example topology: they used 1000 measurements to 
distinguish only between type 1 and the other three types, for 
very small loss rates of up to 1.5%, and they achieved error 
probability 10%. In contrast, with an order of magnitude less 
probes, we distinguish among all four types, and we have a 
very small error probability for larger loss rates (up to 25%). 
Note that the error probability is not monotonic with p: for 
small loss rates, Algorithm [6] results in more erroneous cases 
while Algorithm [5] could give better results. The effect of loss 
is to increase the number of observations of all three groups. 
However, for moderate loss rates, we get enough observations 
of group (iii), thus a small error probability. For larger loss 
rates, the increase in the observations of group (i), which we 
ignore, increases the error probability again, especially for 
small countMax. 

Let us now consider random graphs, in particular Erdos- 
Renyi (ER) vs. Preferential Attachment, depicted in Fig. |6jb) 
and Fig. HJc), respectively. In the lossless case, we can see 
in Fig. [8] that the error probability decreases very rapidly 
with countMax and reaches at countMax ~ 150. We 
also observe that, for the same number of experiments, the 
error is generally smaller in the topology generated using the 
preferential attachment rather than the ER model. This is true 
both in the lossless (Fig. [8]) and in the lossy (Fig. |9|b) and 
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Fig. He)) case. 

VII. Our Work in Perspective 

In this section, we revisit the related work, briefly outlined 
in Section |IlJ but now focusing on the most closely related 
parts. We provide an in-depth comparison of the trade-offs in- 
volved and we draw connections between different approaches. 
We outline possible extensions that can unify active and 
passive tomography with network coding and directions for 
future work. 



A. Comparison to Traditional Tomography w/o Network Cod- 
ing 

Within the large literature on network tomography, the 
most closely related work is the Multiple Source Network 
Tomography in Q, 0, 1401 . which formally defines M- 
by-N tomography problem. Our work on DAGs builds on 
12), l40l : we follow their graceful approach for decomposing 
the M-by-N into a number of 2-by-2 components, inferring 
the type of each 2-by-2 and then merging them up together 
to reconstruct the M-by-N. Using simple network coding 
operations at intermediate nodes, provides a graceful way to 
reveal coding points, which has been typically a challenge in 
traditional tomography. Our work improves upon J2)> l40l . in 
that: (i) it can exactly identify the type of a 2-by-2, as opposed 
to just distinguish between shared and non-shared type; and 
(ii) the merging algorithms can precisely locate the locations of 
joining points with respect to the branching points, as opposed 
to provide bounds. 

Simulation results in Section [VI] on the same topology used 
in 13, showed that our approach is more accurate, with less 
experiments. In essence, our approach is deterministic (one 
observation suffices to distinguish among types) as opposed to 
probabilistic (which needs to collect a large number of probes 
for statistical significance). This benefit comes at the cost of 
having intermediate nodes do some operations. However, these 
operations are so simple (just additions), that can be simply 
thought of as inverse multicast. This cost can be removed, 
if our approach is implemented as passive on top of random 
network coding, as outlined in Section IV11-CI 

B. Comparison to Passive Tomography with Network Coding 

Recently, a passive approach for topology inference on 
top of random network coding has been proposed in l25ll . 
ED, l29l . The probes are sent only once, and intermediate 
nodes pick coding coefficients j3 uniformly at random out 
of a large field F q . The key idea is that, under assumptions 
of strong connectivity and large enough finite field, F q , the 
transfer matrix M, from the sender to the receiver, is distinct 
for different networks, w.h.p. Then, using the observations 

'The reason is that in the preferential attachment topologies, we have a 
few nodes with very high degree and many nodes with a low degree. As a 
result, we have a large number of receivers with a shared joining point and 
some other receivers with distinct joining points. In contrast, in ER graphs, 
we have several roughly equally-sized groups of shared receivers, where each 
group forms a non-shared type with another group. Therefore, we have more 
topologies of type 1 in preferential attachment, which results in smaller error. 



Y at the receivers and the source messages X, exhaustive 
enumeration of all possible topologies is used to find an M 
that matches Y = MX. (Note that M is directly observed 
at the receiver because the source adds unit vectors at the 
beginning of its message.) 

Our approach is different in that it is active and uses 
several probes but simple coding operations over a small field. 
Furthermore, we do not require the end-points to have any a- 
priori knowledge of identity or operations of the intermediate 
nodes. 

Example 6: To better illustrate the differences, let us con- 
sider a 2-by-2, and let us try to infer its type using the two 
approaches. The transfer matrices corresponding to the four 
types of a 2-by-2, shown in Fig. [5] are provided in Fig. [10] 
In contrast, we send probe packets in multiple rounds. In 
each experiment, /3's are either or 1 (since we do additions 
only), and we exclude some of the possible topologies in each 
experiment, until we are left with only one unique topology, 
in at most countMax experiments. ■ 

Our countMax experiments can be thought of as collecting 
observations Y\ = MiX,Y 2 = M 2 X, ..., Y countMa x = 
M count MaxX, where M 1 ,M 2 ,...,M countMax are different 
representations of that unique M. Note that, although M is 
unique in terms of /3's for each topology, it can be shown 
to be non-unique when these /3's are replaced by 0/1 values. 
For example, the transfer matrices for types 1 and 4 seem 
to be the same when all /3's are equal to 1, but only type 4 
can potentially result in M = [1,1; 0,1]. We send probes in 
multiple experiments to create those representations of M that 
help us uniquely identify the underlying topology. 

In terms of finite field size, 11251 . J29l needs a much 
larger finite field than us, in order to get distinct transfer 
matrices for different topologies^ In terms of bandwidth, 
our approach uses smaller packets in a single experiment, 
since our operations are performed over a smaller field and 
a few experiments are required. Furthermore, in 11251 . fl29l the 
coefficients, sent anyway along with the packets through the 
network, are used to reveal the topology from the transfer 
matrix at the receiver, and can be thought of as the equivalent 
of probes. The distinction between active and passive approach 
becomes even less pronounced, if we consider that 11251 . J29l 
requires the receiver to have a-priori knowledge of the size of 
the network, and of the code-book used at each node (referred 
to as "common randomness"), which depends on the node ID 
ll28l . In contrast, we do not require such knowledge and we 
infer the topology with smaller complexity. 

Note that in the general 2-by-N case, similarly to the 2-by-2 
case, we have a 2-by-N transfer matrix M, with each column 
corresponding to one of the receivers. We continue sending 
probes until we get a unique 2-by-2 topology per every pair 

10 In 1251 , it was shown that if the local coding variables are i.i.d uniform 
r.v. over F q , then the probability that all different unicast networks with at 
most | V | nodes and at most | E | edges will have distinct transfer matrices > 
l_|l/| 4 l B l(l-(l-i)l v 'l). This indicates that (i) the probability of success 
goes to 1 iff q — > oo, and (ii) q needs to increase rapidly as the size of the 
network grows. In contrast, as we see in Section IV-B I our approach requires 
only a small field F3 to distinguish among different 2-by-2 topologies. We can 
calculate that if /3's are chosen uniformly at random from F3, then Pr(M<± = 
Mi) = 0.04, which is not negligible for these very small networks. 
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Fig. 10. Comparison of our approach to 1291 in the example of a 2-by2 topology. Mi, M2, M3, M4 are the transfer matrices resulting from the four different 
types (types 1,2,3,4 in Fig. [3} of a 2-by-2, if intermediate nodes use coding coefficients B. The approach in 1251 . 1291 tries to distinguish among these four 
Af's in a single experiment. In contrast, we use B's either or 1 and multiple experiments to choose an M. 

TABLE IE 
Inference with Random Network Coding at intermediate nodes. Possible observations for all four types of 2-BY-2 topologies when Ji 

AND J 2 USE (1,1), (2,3) CODING COEFFICIENTS, RESPECTIVELY. (LOSSY OBSERVATIONS ARE OMITTED DUE TO LACK OF SPACE.) 



Obs. 

# 


Type 1 (gi 
Ri 


oup (ii) obs.) 

R 2 


Type 2 
Ri 


(group (iii) obs.) 

R 2 


Type 3 (group (ii 

Ri 


1) obs.) 

R 2 


Type 4 (group (iii) obs.) 
Ri R 2 


1 


X\ + X 2 


X\ + X 2 


Xl + x 2 


2(xi+x 2 ) 


2(xi+x 2 ) 


Xl + X 2 


Xl + x 2 


2xi + 3x 2 


2 


XI 


xx 


Xl 


2xx 


2xi 


Xl 


Xl 


2xi 


3 


X2 


X2 


X2 


3x2 


3x2 


X2 


X2 


3x2 


4 






Xl + X 2 


2(xi + x 2 ) + 3x 2 


2(xi + x 2 ) + 3x 2 


Xl + X 2 


Xl 


2xi + 3x 2 


5 






Xl 


2xi + 3x 2 


2xi + 3x 2 


Xl 


Xl + x 2 


2xi 


6 






Xl 


3x 2 


3x 2 


Xl 


Xl 


3x 2 


7 






Xl + X 2 


3x 2 


3x 2 


Xl + X 2 


X2 


2xi 


8 














Xl + X 2 


3x 2 


9 














X2 


2xi + 3x 2 



of columns, which can be treated as an independent 2-by-2 
transfer matrix (like what we described above). After all ( 2 ) 
2-by-2's are identified, we use the merging algorithm that in 
each step (i.e., every time one J is localized) evolves F, the 
adjacency matrix of the line graprl j 'L as follows. 

By localizing each joining point, more edges (as rows and 
columns) are added to F. In Section IV-CU we have part of 
F from the assumption about the knowledge of the Si 1- 
by-N tree topology. As we localize each joining point to a 
single logical link between two branching points, we break 
the original edge in the Si topology into two parts, separated 
by the identified joining point. We also add the corresponding 
edges for the paths from S2 to the joining point, and then to 
the receivers. This is shown in the following example. Note 
that in Section IV-C2I we do not have any part of F in advance 
and we build F from scratch by finding all the edges from our 
2-by-2 information. 

Example 7: Consider the simple 2-by-3 network in 
Fig. QT| We start from the S\ topology represented 
by the following F, corresponding to the edges 
S1B12, ^1,2-62, 3, B12R1, ^2,3^2, B2,zRd,'- 



"Under random linear network coding, the transfer matrix M can be 
written as M = A(I — F)~ 1 B T 1441 . A and B T represent the linear mixing 
of the input and output random processes, respectively. Therefore, they do not 
substantially contribute to M . F is the |E|.|£J| adjacency matrix of the line 
graph, where E denotes the number of edges in the graph, which completely 
describes the topology. 




Fig. 11. A simple 2-by-3 topology. F evolves using the 2-by-2 information. 



( 1 1 \ 
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F 







Here the required 2-by-2 types are Si, S2 — Ri, R2 and 
Si,S2—R2,R3- Since Si, S2— Ri, R2 is of type 1, we identify 
•h = J% on SiBi t 2- Thus, we divide S1-B1.2 into S1J1 and 
JiBi : 2- Since Si,S2 — ^2,^3 ls of type 2, we identify J 3 
on i?2.3^?3- Thus, we divide i?2,3^?3 into i?2,3>/3 and J3U3. 
If all three 2-by-2's were of type 1, then S2 would share the 
branching point Bi 2 with Si. But now we need to consider B 2 
for S2 and add all the branches to the joining points J x = J 2 
and J3 from £?2- Therefore, we add S2B2, B2J1, and B2J3 
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Algorithm 8 Algorithm [7] in terms of evolving the adjacency 
matrix of the line graph, F. 

1: for each edge e^ breaking into two edges do 

2: add one row efe +1 = [0,0,..., 0] to i^_i, which is of length 

s«.ze(F;_i) + 1, 
3: add one column e^\ r 
4: transfer all 1 elements from row e^ to row efc +1 in the new matrix 

Fi, and only make the (ej.,e^, x ) element 1. 
5: end for 



to F; where B 2 Ji meets with Si's routes to R\ and R 2 at J\, 
and B2J3 meets with Si's path to R3 at J3. Thus, we get the 
final matrix F, corresponding to the topology in Fig. QT| with 



edges S1J1, S 2 B 2 ,B 2 J lt B 2 J 3 , 
B1.2R1, B 23 R 2 , and J3R3: 



JiB\ 2> Bi 2 B 2 ^, B 23 J 3 , 



F = 
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In general, any 2-by-2 of type 3 or 4, limits the upper 
bound on the joining point location for the receiver under 
consideration; thus, breaks the current edge e^ into two 
consecutive edges. Assume that we have i*i_i at the current 
step, the 2-by-2 type we have considered breaks edge e^ into 
two edges, and we want to find Fi from Fj_i. We need to 
do the following steps: (i) add one row e^+i = [0,0, ...,0] to 
Fj_i, which is of length size(Fi_i) + 1; (ii) also add one 
column e^ +1 ; (iii) transfer all 1 elements from row et to 
row ek+i in the new matrix Fi, and only make the (e^, e^ +1 ) 
element 1. These steps are summarized in Alg. [8] 

C. Extension to Passive Tomography with RNC 

We now discuss how our approach can potentially be 
extended to be implemented as passive, when random network 
coding (RNC) is used in the middle. The intuition is that the 
same algorithms for topology inference should apply if we 
ensure that the RNC coefficients satisfy necessary conditions 
for inference. 

Assume that in each experiment, the intermediate nodes 
perform random linear network coding operations instead of 
the simple additions assumed so far. In this case, Algorithm [6] 
still works if we assign coding coefficients to the joining points 
in a partial order, so as to ensure that the minimum coding 
coefficient of a joining point is always greater than or equal to 
the maximum coding coefficient of its ancestor joining point. 
Under this condition, we can prove that the same rationale as 
in Section EB2] still holds, i.e., type 1 always results in similar 



observations; in type 2, we have c\ 2 — c 22 < 0; in type 3, we 
have C12 — c 22 > 0; and in type 4, both cases are possible. 

Proof: As we can see in Fig. [3] (b) and (c), after the 
equal chance of the two packets xi,x 2 to meet at J\, x 2 has 
an extra chance of meeting with the resulting packet from J\ 
at J 2 , which would result in c\ 2 — c 22 < in type 2, or 
C12 — c 22 > in type 3. However, our assumption about the 
partially ordered coefficients is still necessary; e.g., assume 
that the two packets do not arrive within the same W at J 2 
in type 2: if x 2 arrives within an earlier W at J 2 than the 
resulting packet from J\, then it must get multiplied by a larger 
coefficient at J 2 , rather than the one it already carries from Ji, 
so that C12 — C22 < still holds at the receives. Therefore, the 
condition on the partial order of coding coefficients is required 
for our algorithms to be applicable to types 2 and 3. Type 
4 is simply distinguishable from type 1 by getting different 
observations due to two different sets of coding coefficients 
at Ji,J 2 ; rather than the same coding coefficients from the 
single joining point J\ = J 2 in type 1 . ■ 

Code design to jointly meet both random network coding 
goals (large enough field for independent linear equations) and 
tomographic goals (the aforementioned condition) is part of 
future work. If such a code design is possible, Alg. [6] can 
be directly applied to the case of random network coding. 
For example in Fig. [3] let J\ use (1,1) and J 2 use (2,3) to 
combine the incoming packets. Table [III] shows all possible 
observations. We can see that C12 is greater than, equal to, or 
smaller than C22, depending on the number of joining points a 
probe packet meets on its way towards the two receivers. This 
is the same rationale and pattern as in Table UU 

D. Comparison to traceroute-like approaches 

In practice, the dominant approach to Internet mapping is 
based on traceroute 11301 - 1391 . It uses traceroute's 
sent between selected nodes and collects the ids of the 
nodes along the paths traversed. It faces the challenges of (i) 
resolving anonymous routers and router aliases and (ii) causing 
congestion close to the monitoring points [37 1 . 

Similarly to traceroute, we also use active end-to- 
end probes and we require some minimal co-operation from 
internal nodes (simple additions in our case vs. traceroute - 
specific responses). However, unlike traceroute, we do 
not ask intermediate nodes to reveal their node id, which has 
the advantage of preserving the anonymity of intermediate 
nodes. A design difference was also noted in Section IV-B4I 
we infer 2-by-2 components, instead of 1-by-l's (path) for 
traceroute. 

In terms of measurement bandwidth, our approach uses 
exactly one probe per link per experiment, which is the 
minimum possible. This is thanks to network coding that 
combines multiple incoming packets into one, and thanks 
to multicast that replicates a single incoming packet into 
many outgoings, thus eliminating overlap. For example, our 
approach reduces the number of measured paths in a 2-by-N 
topology by a factor of two, compared to traceroute; i.e., 
we require 0(N) instead of 0(2N) measurements, since each 
coded packet observes two paths. 
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In order to provide a quantitative comparison of our ap- 
proach vs. traceroute, we compare the average number 
of packets per link, in type 1 and type 4 topologies. Stan- 
dard traceroute sends three probe packets for each hop 
count, in each source-destination pair. Assuming that all nodes 
are responsive, traceroute results in 14.4 packets/link if the 
underlying topology is of type 1, and 9 packets/link if the 
underlying topology is of type 4. In contrast, in our approach, 
each link is traversed by a probe packet exactly once in each 
experiment. It is well-known that traceroute results in 
increased overhead on links close to sources and receivers!^ 
Network coding allows flows to share bottlenecks without 
competition. We note however, that although we are efficient 
in a single experiment (one probe per link) and we use small 
probes, we repeat our experiments for up to countMax times; 
by adjusting countMax, we trade-off accuracy for the load. 

VIII. Conclusion 

In this paper, we design active probing schemes that exploit 
simple operations at the intermediate nodes to accurately infer 
the network topology, based on end-to-end observations. We 
design algorithms for trees and general topologies, and sim- 
ulate them in representative examples. Our main contribution 
is that we show how to exploit the fundamental connection 
between network coding and topology and thus adding one 
new building block in the space of available options for 
topology inference. The application context depends on the 
capabilities and constraints of the network. We expect the 
techniques developed in this paper to be most useful in 
networks that are already, or can be easily, equipped with 
network coding capabilities, such as overlay or wireless mesh 
networks. 
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